Learning from Video - yuyan

Learning from Video

ロボット基盤モデル

ロボットシステム

映像基盤モデル

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

https://arxiv.org/abs/2510.06218

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

https://arxiv.org/abs/2412.04445

HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton

https://arxiv.org/pdf/2510.03022

video mimic

https://www.videomimic.net/

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

https://openaccess.thecvf.com/content_ECCV_2018/papers/Dima_Damen_Scaling_Egocentric_Vision_ECCV_2018_paper.pdf?utm_source=chatgpt.com

Challenges and Trends in Egocentric Vision: A Survey

https://arxiv.org/html/2503.15275v1?utm_source=chatgpt.com

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

https://arxiv.org/html/2508.04681v1?utm_source=chatgpt.com

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

https://www.roboticsproceedings.org/rss14/p02.pdf?utm_source=chatgpt.com

EgoMimic: Scaling Imitation Learning via Egocentric Video

https://openreview.net/pdf/da5952e56d3c5b2704518851708c6a97e0a43d28.pdf

Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos

https://hci.stanford.edu/publications/2021/truong_auto/truong_2021.pdf

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

https://openaccess.thecvf.com/content/CVPR2025/papers/Soucek_ShowHowTo_Generating_Scene-Conditioned_Step-by-Step_Visual_Instructions_CVPR_2025_paper.pdf

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Aligning_Step-by-Step_Instructional_Diagrams_to_Video_Demonstrations_CVPR_2023_paper.pdf

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

https://openaccess.thecvf.com/content_ICCV_2019/papers/Miech_HowTo100M_Learning_a_Text-Video_Embedding_by_Watching_Hundred_Million_Narrated_ICCV_2019_paper.pdf

Multimodal Language Models for Domain-Specific Procedural Video Summarization

https://scispace.com/pdf/multimodal-language-models-for-domain-specific-procedural-1to3soi699.pdf

Screencast Tutorial Video Understanding

https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Screencast_Tutorial_Video_Understanding_CVPR_2020_paper.pdf

Learning To Recognize Procedural Activities with Distant Supervision

https://openaccess.thecvf.com/content/CVPR2022/papers/Lin_Learning_To_Recognize_Procedural_Activities_With_Distant_Supervision_CVPR_2022_paper.pdf

A comprehensive survey of procedural video datasets

https://www.sciencedirect.com/science/article/abs/pii/S1077314220301314